In this tutorial, We will see how to build a severity index using a key informant interview dataset. Building such index, also called composite indicator, is among the regular and expected task of any humanitarian data analyst.The objective is to summarise information into a simple indicator that can reduce information overflow. Severity index informs the sectoral and inter-sectoral discussions taking place as part of the Humanitarian Needs Overview (HNO) process, in particular facilitating a comparison of needs across geographic areas.

An approach to follow for this is described in the Annex1: Technical Guidance of the 2014 GUIDANCE: Humanitarian Needs Comparison Tool. Those guidance were provided with a complex and heavy excel tool that includes many macros script and 13 pages of narrative explanation to explain how it works.

Rather than using script through complicated Macros, using R is a good and easier alternative. Build those severity index though a documented and linear process brings more credibility in the severity index generation as the analysis becomes de facto entirely reproducible (as a difference with complex excel spreadsheet with macro formula spread around multiple worksheet). Existing packages such as R Compind package for composite indicators have already been developed and released as open source and therefore ready to be leveraged without any software procurement.

As the analysis workflow shall remain the same, anyone can take and study this tutorial source code, then change the dataset behind with their own in order to re-use it.

Why do you need statistical analysis for indicator composition?

When developing severity index, each steps comes with methodological questions:

The 2014 guidance advise to determine the weight of each sub-indicator through (often unreachable…) expert consensus. Such consensus, when obtained, usually takes (very long…) consultation and consideration with concerned stakeholders that are most of the time not statisticians themselves and will provide mostly guesstimates for each weight. This approach in the academic world is known as a budget allocation process (BAP), where set of chosen decision makers (e.g. a panel of experts) is given ‘n’ points to distribute to the indicators, or groups of indicators (e.g. dimensions), and then an average of the experts’ choices is used. An alternative to BAP, called “analytic hierarchy process - AHP”, is collect each expert opinion in a structured way through pairwise comparison. A rule of thumb for any expert judgement process is to have fewer than 10 indicators so that the approach is optimally executed cognitively, as such most of the time, they are not appropriate for the development of humanitarian severity index.

Arithmetic and geometric aggregation with equal weighting are originally also suggested in the guidance may also comes with assumption of compensability (i.e. Perfect substituability – compensates bad performance in one aspect with good performance in another) that are often not verified. Such points have been largely discussed in literature (see Benini, Aldo (2016). Severity measures in humanitarian needs assessments - Purpose,measurement, integration. Technical note - 8 August 2016 -. Geneva, Assessment CapacitiesProject (ACAPS)).

The OECD handbook for composite Indicators calculation also taught through the European Joint Research Training on Composite Indicators present a series of alternatives that can inform / help confirming or triangulate “expert-judgement” . As we will see below, some statistical method are easily accessible for R users in order to get indicators weights.

Installing packages

To get started, if you don’t have them already, the following packages are necessary.

## This function will retrieve the packae if they are not yet installed.
using <- function(...) {
    libs <- unlist(list(...))
    req <- unlist(lapply(libs,require,character.only = TRUE))
    need <- libs[req == FALSE]
    if (length(need) > 0) { 
        install.packages(need)
        lapply(need,require,character.only = TRUE)
    }
}

## Getting all necessary package
using("readr","readxl","dplyr","ggplot2","corrplot","psych","bbplot",
      "scales","ggiraph","ggrepel","Compind","ggcorrplot","kableExtra",
      "reshape2","qgraph", "sf","cartography",
      "raster", "dismo", "xlsx", "rgdal")

rm(using)




# This small function is used to have nicely left align text within 
# charts produced with ggplot2
left_align <- function(plot_name, pieces){
  grob <- ggplot2::ggplotGrob(plot_name)
  n <- length(pieces)
  grob$layout$l[grob$layout$name %in% pieces] <- 2
  return(grob)
}

unhcr_style <- function() {
  font <- "Lato"
  ggplot2::theme(
    
#This sets the font, size, type and colour of text for the chart's title
  plot.title = ggplot2::element_text(family = font, size = 20,
                                     face = "bold", color = "#222222"),

# This sets the font, size, type and colour of text for the chart's subtitle,
# as well as setting a margin between the title and the subtitle
  plot.subtitle = ggplot2::element_text(family = font, size = 16, 
                                        margin = ggplot2::margin(9,0,9,0)),
  plot.caption = ggplot2::element_blank(),

# This sets the position and alignment of the legend, removes a title 
# and backround for it and sets the requirements for any text within the legend.
# The legend may often need some more manual tweaking when it comes to its exact
# position based on the plot coordinates.
  legend.position = "top",
  legend.text.align = 0,
  legend.background = ggplot2::element_blank(),
  legend.title = ggplot2::element_blank(),
  legend.key = ggplot2::element_blank(),
  legend.text = ggplot2::element_text(family = font, size = 13, color = "#222222"),

# This sets the text font, size and colour for the axis test, as well as setting
# the margins and removes lines and ticks. In some cases, axis lines and axis 
# ticks are things we would want to have in the chart
  axis.title = ggplot2::element_blank(),
  axis.text = ggplot2::element_text(family = font, size = 13, color = "#222222"),
  axis.text.x = ggplot2::element_text(margin = ggplot2::margin(5, b = 10)),
  axis.ticks = ggplot2::element_blank(),
  axis.line = ggplot2::element_blank(),

# This removes all minor gridlines and adds major y gridlines. 
# In many cases you will want to change this to remove y gridlines and add x gridlines. 
  panel.grid.minor = ggplot2::element_blank(),
  panel.grid.major.y = ggplot2::element_line(color = "#cbcbcb"),
  panel.grid.major.x = ggplot2::element_blank(),

# This sets the panel background as blank, removing the standard grey
# ggplot background colour from the plot
  panel.background = ggplot2::element_blank(),

# This sets the panel background for facet-wrapped plots to white, 
# removing the standard grey ggplot background colour and sets the title size 
# of the facet-wrap title to font size 22
  strip.background = ggplot2::element_rect(fill = "white"),
  strip.text = ggplot2::element_text(size  = 13,  hjust = 0)
  )
}

Data set - Key Informant Interview for Hurricane Dorian, Bahamas

This dataset was collected at the beginning of November by IOM in the Bahamas. It covers Aback Island which has been severely hit by Hurricane Dorian on 1:3 September.

The dataset has been publicly released and is available through HDX here. An initial report has been produced.

data <- read_excel("DTM R3 DB Great-Little Abaco MSLA V4.xlsx", sheet = "BD")


## For some field the value '9999' has been entered - for the sake of analysis, 
# we will assume that this correspond to answers 'no' - 0
# to replace multiple values in a data frame, looping through all columns might help.
for (i in seq_along(data)) {
    data[[i]][data[[i]] == "9999"] <- "0"
}
# This has converted numerci to character - let's bring them back to numeric
data[, c(3:ncol(data))] <- sapply(data[, c(3:ncol(data))], as.numeric)
## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion
## Point of control
#View(data[ ,c("J_107_VisitInmigration")])

The data includes 188 variables and 23 records, each records corresponding to a location in Abaco.

Building the theoretical framework.

The original dataset already includes a data dictionary. The initial step is add more column to this dictionary in order to record the Severity theoretical framework.

A simple approach here is to regroup indicators by topics / composite dimensions. A composite dimension shall be apprehended as a construct i.e. a defined idea of sectoral severity that can only be measured through a number of simpler elements. A good exercise then is to try to give a definition of the type of sectoral severity that is looked at: The definition of the concept should give a clear sense of what is being measured by the composite index. It should be noted that not all indicators shall be blindly included: common sense shall be used to confirm that selected sub-indicators are indeed constructive or reflective of the actual dimension ‘construct’.

Looking at the spreadsheet from HDX, we can see that the data structure has already been reshaped / ‘hot coded’ as what is called a dummy variable so that each variable that includes modalities (categorical: i.e select_one with more than 2 modalities or select_multiple) is split into as many variable as there are modalities.

Single questions could then be extracted and had to be renamed both in terms of short code qid shortened Label for good appearance on charts qlabel. as a rule of thumb, a good variable name shall be self-explanatory (avoid q1, q2 , etc.) and very short - not more than 10 characters - A label should not be more than 80 characters and even less if possible.

Then we need to mention how the variable shall be used for the composite calculation, i.e. either “binary”, “value” or “scored” as we will see later.

### Need to implemet manually the analysis framework in the data dictionnary in order 
# to get the calculation
dico <- read_excel("DTM R3 DB Great-Little Abaco MSLA V4.xlsx", sheet = "CatPregs")


## Checking the dico matche with dataframe --
dico1 <- as.data.frame(names(data))
names(dico1)[1] <- "cod_pregunta"
dico <- plyr::join(x = dico1 , y = dico , by = "cod_pregunta", type = "left" )
# Adding variable label in data frame
for (i in 1:nrow(data)) { attributes(data)$variable.labels[ i] <- as.character(dico[ i, c("label")]) }


rm(i, dico1)

## Now creating score variable and applying same direction to all of them!

#levels(as.factor(dico$Calculation))
#labels(dico)
subindicator <- dico[ dico$Calculation %in% c("binary", "value", "scored"), ]
subindicator.unique <- as.data.frame( unique(subindicator[ ,c("qid",  "question",
                                                              "qtype" ,
                                                              "Calculation", "Dimension",
                                                              "qlabel","justification",
                                                              "Polarity"   )]))

## Display the table
subindicator.unique2 <- as.data.frame( unique(subindicator[ ,c( "Dimension","qlabel",
                                                                "qtype" , "Calculation",  
                                                                "Polarity"   )]))
row.names(subindicator.unique2) <- NULL


##  Frame with all dimensions
dimensions <- as.data.frame( unique(subindicator[ ,c( "Dimension" )]))
names(dimensions)[1] <- "Dimension"
## Get first Protection
i <- 2
  ## looping around dimensions
this.dimension <- as.character(dimensions[i,1])
  
subindicator.unique3 <- as.data.frame( subindicator.unique2[ subindicator.unique2$Dimension == this.dimension ,
                                                  c( "qlabel",  "Calculation",
                                                                "Polarity"   )])
row.names(subindicator.unique3) <- NULL  

knitr::kable(subindicator.unique3, caption = "Theoretical Framework") %>%
           kable_styling(bootstrap_options = c("striped", "bordered",
                                               "condensed", "responsive"),
                         font_size = 9)
Theoretical Framework
qlabel Calculation Polarity
Concerning hierarchy of unmet needs scored Negative (the higher score, the more severe)
Community Relation Level scored Negative (the higher score, the more severe)
Relation with Host Community scored Negative (the higher score, the more severe)
Women Good safety scored Positive (the higher score, the less severe)
Men Good Safety scored Positive (the higher score, the less severe)
Child Good Safety scored Positive (the higher score, the less severe)
Reported Security Incident binary Negative (the higher score, the more severe)
Reported Immigration Visit binary Negative (the higher score, the more severe)
Positive Referral Mechanisms binary Positive (the higher score, the less severe)
Positive Safe/Recreational Places binary Positive (the higher score, the less severe)
Positive Diversity of Information source scored Positive (the higher score, the less severe)

Calculating each sub-indicator value

The survey has mostly responses of types: ‘select_one’ and ‘select_multiple’. Indeed, collecting reliable numeric value is a well-knwon limitation of key-informant based survey.

Different calculations were used depending on the type of questions:

## Creating scores for each of those indicator ###########
## Num col where indic will start to be appended
numcol1 <- ncol(data) + 1
numcol <- ncol(data)

## looping around each new sub indicator to create
for (j in 1:nrow(subindicator.unique)) {

    # j <- 2
    this.indicator.comp <- as.character(subindicator.unique[ j, c("Calculation")])
    this.indicator.type <- as.character(subindicator.unique[ j, c("qtype")])
    this.indicator.name <- as.character(subindicator.unique[ j, c("qid")])
    this.indicator.label <- as.character(subindicator.unique[ j, c("qlabel")])
    this.indicator.question <- as.character(subindicator.unique[ j, c("question")])
    this.indicator.Polarity <- as.character(subindicator.unique[ j, c("Polarity")])
    this.indicator.Dimension <- as.character(subindicator.unique[ j, c("Dimension")])

    ## let's add a variable in the data frame and give it the indic name
    numcol <- numcol + 1
    data[ , numcol] <- ""
    names(data)[numcol] <- this.indicator.name
    attributes(data)$variable.labels[numcol] <- this.indicator.label

    ## Display where we are in the console - always usefull to debug!
    ## Take the # when playing with the script
    #cat(paste0("\n\n===  Indicator: ", j, "-",this.indicator.Dimension ,
    #           "-",this.indicator.label ,"\n",this.indicator.comp ,
    #           "-",this.indicator.Polarity ,"\n"))

    ## Now accounting for 2 distinct cases to get my calculation
    ## "binary" / "value" or "scored"

      ## If it's a score, we need to sum up all scores for that questions  #####
      # - independently of wether it's a select_one or select_multiple -
      if (this.indicator.comp == "scored" ) {
      ## get the correponding value var
        this.value.var <- as.character(subindicator[ subindicator$question == this.indicator.question,
                                                     c("cod_pregunta") ])
        this.subset <- data[ , this.value.var]

        ## Now get an apply the coeffcient
        this.subsetscore <- t(as.data.frame(subindicator[ subindicator$cod_pregunta %in% this.value.var,
                                                          c("scoremodality") ]))

      ## multiply data frame by a vector
      this.subset3 <- data.frame(mapply(`*`,this.subset, this.subsetscore, SIMPLIFY = FALSE))
      #str(this.subset3)
      # this.subset4 <- cbind(this.subset3,  rowSums(this.subset3, na.rm = TRUE))

      #cat(paste0("Calculation of Indicator: ", j, "\n"))
      ## Get the sum of the row
      data[ , numcol] <- rowSums(this.subset3, na.rm = TRUE)
    }
     

    ## If it's a  value, we just take the value of that binary  #####-
    if (this.indicator.comp %in% c("value")) {
      ## get the correponding value var
      this.value.var <- as.character(dico[ dico$question == this.indicator.question,
                                           c("cod_pregunta") ])
      ## Apply the value
      #cat(paste0("Calculation of Indicator: ", j, "\n"))
      data[ , numcol] <- data[ , this.value.var]
    }
    
    ## If it's a binary , we add 1 to avoid zero value #####-
    if (this.indicator.comp %in% c("binary")) {
      ## get the correponding value var
      this.value.var <- as.character(dico[ dico$question == this.indicator.question,
                                           c("cod_pregunta") ])
      
      #cat(paste0("Calculation of Indicator: ", j, "\n"))
      
      ## Apply the value + 1 to avoid zero value
      data[ , numcol] <- data[ , this.value.var] + 1
      
      ## Ridit transformation
      # data[ , numcol] <- (cumsum(data[ , this.value.var]) - .5 *
      #                       data[ , this.value.var]) /
      #                          sum(data[ , this.value.var])
      
    }
      
    ## clean
    rm(this.indicator.comp, this.indicator.type, this.indicator.name,
       this.indicator.label, this.indicator.question, this.indicator.Polarity ,
       this.indicator.Dimension, this.subset, this.subset3, this.subsetscore, this.value.var )
}

Sub-Indicator Polarity & Data Normalization,

We can now isolate our new numeric and scaled indicators within a matrix as this object type will be required for the rest of the calculations.

First, we need to eliminate indicators with poor discrimination capacity, i.e. when all values are the sames.

## Double Checking results
#View(data[ , numcol1:ncol(data)])
indic <- data[ , numcol1:ncol(data)]

### Remove var when standard deviation is 0
indic2 <- indic[, sapply(indic, function(x) { sd(x) != 0} )]

## For some field, we still have indic with zero value - 
## This a data quality issue that shows that some modalities were missing 
# for select_one or select_multiple
### We assume that this account for not severe situation - i.e. replaced by one
for (i in seq_along(indic2)) {
    indic2[[i]][indic2[[i]] == "0"] <- "1"
}
# This has converted from numeric to character - let's bring them back to numeric
indic2 <- sapply(indic2, as.integer)

## Refresh my dictionnary of indic
subindicator.unique2 <- subindicator.unique[ subindicator.unique$qid
                                             %in% names(as.data.frame(indic2)), ]

## Transform this object as a matrix and inject location name as row.names
indic.matrix <- as.matrix(indic2)
row.names(indic.matrix) <- data$C_101_name

The polarity of a sub-indicator is the sign of the relationship between the indicator and the phenomenon to be measured (e.g., in a well-being index, “GDP per capita” has ‘positive’ polarity and “Unemployment rate” has ‘negative’ polarity). In this case, we have 2 options for such directional adjustments:

  • “Negative (the higher score, the more severe)”
  • “Positive (the higher score, the less severe)”

This component is accounted for during the normalization process below.

Data Normalization allows for Adjustments of distribution (similar range of variation) and scale (common scale) of sub-indicators that may reflect different units of measurement and different ranges of variation.

Due to the structure of the indicators, distinct approaches of normalization shall be considered in order to avoid having zero value that would create issues for geometric means aggregation. Different normalization methods are available through the function normalise_ci and lead to different results:

  • A z-score approach method = 1 (Imposes a distribution with mean zero and variance 1). Standardized scores which are below average will become negative, implying that further geometric aggregation will be prevented.

  • A min-max approach method = 2 (same range of variation [0,1]) but not same variance).This method is very sensitive to extreme values/outliers

  • A ranking method method = 3. Scores are replaced by ranks – e.g. the highest score receives the first ranking position (rank 1).

## Retrieve polarity from dictionnary
for (i in 1:nrow(subindicator.unique2)) {
  if (subindicator.unique2[ i, c("Polarity")] == "Negative (the higher score, the more severe)")  
    {subindicator.unique2[ i, c("polarity")]  <- "NEG"} else 
    {subindicator.unique2[ i, c("polarity")]  <- "POS"}
 }
subindicator.unique2$dir <- 1

## Case 1
subindicator.scored <- subindicator.unique2[ subindicator.unique2$Calculation == "scored", ]
var.scored <- as.character(subindicator.scored[ , c("qid") ])
indic.matrix.scored <- indic.matrix[ , var.scored ]
indic.matrix.scored.obj <- normalise_ci(indic.matrix.scored,
                                     c(1:ncol(indic.matrix.scored)),
                                     polarity =  as.character(subindicator.scored$polarity ),
                                     method = 3)

## Case 2
# subindicator.value <- subindicator.unique2[ subindicator.unique2$Calculation == "value", ]
# var.value <- as.character(subindicator.value[ , c("qid") ])
# indic.matrix.value <- indic.matrix[ , var.value ]
# indic.matrix.value.obj <- normalise_ci(indic.matrix.value,
#                                      c(1:ncol(indic.matrix.value)),
#                                      polarity =  as.character(subindicator.value$polarity ),
#                                      method = 2)

## Case 3
subindicator.binary <- subindicator.unique2[ subindicator.unique2$Calculation == "binary", ]
var.binary <- as.character(subindicator.binary[ , c("qid") ])
indic.matrix.binary <- indic.matrix[ , var.binary ]
indic.matrix.binary.obj <- normalise_ci(indic.matrix.binary,
                                     c(1:ncol(indic.matrix.binary)),
                                     polarity =  as.character(subindicator.binary$polarity ),
                                     method = 3)

## Binding this together so that we have the full normalised matrix
indic.matrix.norm <- cbind(indic.matrix.scored.obj$ci_norm,
                          # indic.matrix.value.obj$ci_norm,
                           indic.matrix.binary.obj$ci_norm)

## Clean the work environment  
rm( subindicator.unique,
 #  subindicator.value, var.value , indic.matrix.value, indic.matrix.value.obj,
   subindicator.binary, var.binary , indic.matrix.binary, indic.matrix.binary.obj,
   subindicator.scored, var.scored , indic.matrix.scored, indic.matrix.scored.obj   )

Correlation analysis

The investigation of the structure of simple indicators can be done by means of correlation analysis.

We will check such correlation first within each dimension, using the ggcorrplot package. ’

##  Frame with all dimensions
# dimensions <- as.data.frame( unique(subindicator[ ,c( "Dimension" )]))
# names(dimensions)[1] <- "Dimension"

## Creating severity subindice on each dimensions with Data Envelopment analysis #####

#for (i in 1:nrow(dimensions)) {
#for (i in 1:2) {  
  # i <- 2
  # ## looping around dimensions
  # this.dimension <- as.character(dimensions[i,1])
  ## subset related indicator names
  this.indicators <- as.character(subindicator.unique2[ subindicator.unique2$Dimension == this.dimension,
                                                        c("qid") ])
  this.indicators.label <- as.character(subindicator.unique2[ subindicator.unique2$Dimension == this.dimension,
                                                        c("qlabel") ])
  ##subset matrix & df
  this.indic.matrix.norm <- indic.matrix[ , this.indicators]
  this.indic.df <- indic2[ , this.indicators]

### Check correlation
corr.matrix <- cor(this.indic.matrix.norm, method = "pearson",  use = "pairwise.complete.obs")
  
  
## replace with Label inside the matrix
corr.matrix1 <- corr.matrix
rownames(corr.matrix1) <- as.character(subindicator.unique2[subindicator.unique2$qid %in% rownames(corr.matrix), c("qlabel")])
colnames(corr.matrix1) <- as.character(subindicator.unique2[subindicator.unique2$qid %in% colnames(corr.matrix), c("qlabel")])

plot1 <- ggcorrplot(corr.matrix1 ,
                    method = "circle",
                    hc.order = TRUE,
                    type = "upper") +
  labs(title = paste0( "Severity Indicators for ",this.dimension ),
       subtitle = "Identified Correlation between indicators",
       caption = "Correlation level = dot size, Positive Correlation  = Red - Negative = Blue",
       x = NULL, y = NULL) +
  bbc_style() +
  theme( plot.title = element_text(size = 13),
         plot.subtitle = element_text(size = 11),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 7),
         strip.text.x = element_text(size = 7),
         axis.text.x = element_text(angle = 45, hjust = 1),
         legend.position = "top",
         legend.box = "horizontal",
         legend.text = element_text(size = 9),
         panel.grid.major.x = element_line(color = "#cbcbcb"),
         panel.grid.major.y = element_line(color = "#cbcbcb"))
ggpubr::ggarrange(left_align(plot1, c("subtitle", "title")), ncol = 1, nrow = 1)

Another approach to better visualize correlation between indicators is to represent them through a network with the ggpraph package.

qgraph(cor(this.indic.matrix.norm),
     # shape = "circle",
     # posCol = "darkgreen",
     # negCol = "darkred",
     # threshold = "bonferroni", #The threshold argument can be used to remove edges that are not significant.
     # sampleSize = nrow(scores.this.norm),
     # graph = "glasso",
       esize = 35, ## Size of node
       vsize = 6,
       vTrans = 600,
       posCol = "#003399", ## Color positive correlation Dark powder blue
       negCol = "#FF9933", ## Color negative correlation Deep Saffron
       alpha = 0.05,
       cut = 0.4, ## cut off value for correlation
       maximum = 1, ## cut off value for correlation
       palette = 'pastel', # adjusting colors
       borders = TRUE,
       details = FALSE,
       layout = "spring",
       nodeNames = this.indicators.label ,
       legend.cex = 0.4,
       title = paste0("Correlations Network for severity indicators related ",this.dimension ),
       line = -2,
       cex.main = 2)

Consistency between indicators

Cronbach’s alpha, (or coefficient alpha), developed by Lee Cronbach in 1951, measures reliability (i.e. how well a test measures what it should: measure of the stability of test scores), or internal consistency.

As a rule of thumbs, a score of more than 0.7 indicates an acceptable level of consistency:

  • A high level for alpha may mean that all indicators are highly correlated (meaning we have redundant indicators representing the same thing…).
  • A low value for alpha may mean that there are not enough indicators or that the indicators are poorly interrelated.

Cronbach.this <- psych::alpha(this.indic.matrix.norm, check.keys = TRUE)

cat(paste0("The Cronbach Alpha measure of consistency for this combination of indicators is  ", round(Cronbach.this$total$std.alpha, 2), "\n." ) )
The Cronbach Alpha measure of consistency for this combination of indicators is  0.75
.

Aggregation & Weighting

For weighting, the main issue to address is related to the concept of compensability. Namely the question is to know to what extent can we accept that the high score of an indicator go to compensate the low score of another indicator? This problem of compensability is intertwined with the issue of attribution of weights for each sub-indicator in order to calculate the final aggregation.

We can foresee that using “equal weight” (all indicators account for the same in the final index) and “arithmetic aggregation” (all indicators are substituable) is unlikely to depict the complex issue of Humanitarian Severity and is likely to comes with the risk of misrepresenting the reality.

Various methods are available within the Compind package are described below. This R package is also available through a ShinyApp. We will then share the code to use them based on our example.

Benefit of the Doubt approach (BoD)

This method is the application of Data Envelopment Analysis (DEA) to the field of composite indicators. It was originally proposed by Melyn and Moesen (1991) to evaluate macroeconomic performance. ACAPS has prepared an excellent note on The use of data envelopment analysis to calculate priority scores in needs assessments.

BoD approach offers several advantages:

  • Weights are endogenously determined by the observed performances and benchmark is not based on theoretical bounds, but it’s a linear combination of the observed best performances.

  • Principle is easy to communicate: since we are not sure about the right weights, we look for ”benefit of the doubt” weights (such that your overall relative performance index is as high as possible).

CI_BoD_estimated = ci_bod(this.indic.matrix.norm,
                          indic_col = (1:ncol(this.indic.matrix.norm)))

ci_bod_est <- as.data.frame( CI_BoD_estimated$ci_bod_est)
names(ci_bod_est) <- "Benef_Doubt"

Directional Benefit of the Doubt (D-BoD)

Directional Benefit of the Doubt (D-BoD) model enhance non-compensatory property by introducing directional penalties in a standard BoD model in order to consider the preference structure among simple indicators. This method is described in the article Enhancing non compensatory composite indicators: a directional proposal.

## Endogenous weight - no zero weight --
CI_Direct_BoD_estimated <-  ci_bod_dir(this.indic.matrix.norm,
                                       indic_col = (1:ncol(this.indic.matrix.norm)),
                                       dir = as.numeric(subindicator.unique2[subindicator.unique2$Dimension == this.dimension , c("dir")] ))

ci_bod_dir_est <- data.frame(CI_Direct_BoD_estimated$ci_bod_dir_est)
names(ci_bod_dir_est) <- "Benef_Doubt_Dir"

Robust Benefit of the Doubt approach (RBoD)

ci_rbod_est is the robust version of the BoD method. It is based on the concept of the expected minimum input function of order-m so “in place of looking for the lower boundary of the support of F, as was typically the case for the full-frontier (DEA or FDH), the order-m efficiency score can be viewed as the expectation of the maximal score, when compared to m units randomly drawn from the population of units presenting a greater level of simple indicators”, Daraio and Simar (2005). This method is described with more detail in the article Robust weighted composite indicators by means of frontier methods with an application to European infrastructure endowment.

CI_RBoD_estimated <-  ci_rbod(this.indic.matrix.norm,
                              indic_col = (1:ncol(this.indic.matrix.norm)),
                              M = 20,  #The number of elements in each sample.
                              B = 200) #The number of bootstap replicates.

ci_rbod_est <- data.frame(CI_RBoD_estimated$ci_rbod_est)
names(ci_rbod_est) <- "Benef_Doubt_Rob"

Benefit of the Doubt approach (BoD) index with constraints on weights

ci_bod_constr_est_mpi allows for constraints (so that there is none not valued) and with a penalty as proposed by Mazziotta - Pareto (also adopted by the Italian National Institute of Statistics). This method is described in the article Geometric mean quantity index numbers with Benefit-of-the-Doubt weights

CI_BoD_MPI_estimated = ci_bod_constr_mpi(this.indic.matrix.norm,
                                          indic_col = (1:ncol(this.indic.matrix.norm)),
                                          up_w = 1,
                                          low_w = 0.1,
                                          penalty = "POS")

ci_bod_constr_est_mpi <- data.frame(CI_BoD_MPI_estimated$ci_bod_constr_est_mpi)
names(ci_bod_constr_est_mpi) <- "Benef_Doubt_Cons"

Factor analysis

ci_factor_est groups together simple indicators to estimate a composite indicator that captures as much as possible of the information common to individual indicators.

##  Doing PCA with ci_factor.R
# If method = "ONE" (default) the composite indicator estimated values are equal to first component scores;
# if method = "ALL" the composite indicator estimated values are equal to component score multiplied by its proportion variance;
# if method = "CH" it can be choose the number of the component to take into account.
dimfactor <- ifelse(ncol(this.indic.matrix.norm) > 2, 3, ncol(this.indic.matrix.norm))
CI_Factor_estimated <-  ci_factor(this.indic.matrix.norm,
                                  indic_col = (1:ncol(this.indic.matrix.norm)),
                                  method = "CH",  # if method = "CH" it can be choose the number of the component to take into account.
                                  dim = dimfactor)
ci_factor_est <- data.frame( CI_Factor_estimated$ci_factor_est)
names(ci_factor_est) <- "Factor"

Mean-Min Function (MMF)

ci_mean_min_est is an intermediate case between arithmetic mean, according to which no unbalance is penalized, and min function, according to which the penalization is maximum. It depends on two parameters that are respectively related to the intensity of penalization of unbalance (alpha) and intensity of complementarity (beta) among indicators. “An unbalance adjustment method for development indicators”

CI_mean_min_estimated <- ci_mean_min(this.indic.matrix.norm,
                                     indic_col = (1:ncol(this.indic.matrix.norm)),
                                     alpha = 0.5,  #intensity of penalization of unbalance  (alpha)
                                     beta = 1) # intensity of complementarity (beta) among indicators

ci_mean_min_est <- data.frame( CI_mean_min_estimated$ci_mean_min_est)
names(ci_mean_min_est) <- "Mean_Min"

Geometric aggregation

This method uses the geometric mean to aggregate the single indicators and therefore allows to bypass the full compensability hypothesis using geometric mean. Two weighting criteria are possible: EQUAL: equal weighting and BOD: Benefit-of-the-Doubt weights following the Puyenbroeck and Rogge (2017) approach.

CI_Geom_estimated = ci_geom_gen(this.indic.matrix.norm,
                                indic_col = (1:ncol(this.indic.matrix.norm)),
                                meth = "EQUAL",
                                ## "EQUAL" = Equal weighting set, "BOD" = Benefit-of-the-Doubt weighting set.
                                up_w = 1,
                                low_w = 0.1,
                                bench = 1)
# Row number of the benchmark unit used to normalize the data.frame x.

ci_mean_geom_est <- data.frame( CI_Geom_estimated$ci_mean_geom_est)
names(ci_mean_geom_est) <- "Mean_Geom"

Mazziotta-Pareto Index (MPI)

This method is is a non-linear composite index method which transforms a set of individual indicators in standardized variables and summarizes them using an arithmetic mean adjusted by a “penalty” coefficient related to the variability of each unit (method of the coefficient of variation penalty).

CI_MPI_estimated <- ci_mpi(this.indic.matrix.norm,
                           indic_col = (1:ncol(this.indic.matrix.norm)),
                           penalty = "NEG")  # Penalty direction; ”POS” (default) in case of increasing
#  or “positive” composite index (e.g., well-being index),
#  ”NEG” in case of decreasing or “negative” composite
#  index (e.g., poverty index).

ci_mpi_est <- data.frame( CI_MPI_estimated$ci_mpi_est)
names(ci_mpi_est) <- "Mazziotta_Pareto"

Wroclaw taxonomy method

This last method (also known as the dendric method), originally developed at the University of Wroclaw, is based on the distance from a theoretical unit characterized by the best performance for all indicators considered; the composite indicator is therefore based on the sum of euclidean distances from the ideal unit and normalized by a measure of variability of these distance (mean + 2*std).

CI_wroclaw_estimated <-  ci_wroclaw(this.indic.matrix.norm,
                                    indic_col = (1:ncol(this.indic.matrix.norm)))

ci_wroclaw_est <- data.frame( CI_wroclaw_estimated$ci_wroclaw_est)
names(ci_wroclaw_est) <- "Wroclaw"

Visualise output

In a table

this.indic.matrix.norm2 <- cbind( #row.names(scores.this),
  ci_bod_est, # Benefit of the Doubt approach
  ci_rbod_est, # Robust Benefit of the Doubt approach
  ci_bod_dir_est, # Directional Robust Benefit of the Doubt approach
  ci_bod_constr_est_mpi, # Robust Benefit of the Doubt approach with constraint
  ci_factor_est, # Factor analysis  componnents
  ci_mean_geom_est, # Geometric aggregation
  ci_mean_min_est, # Mean-Min Function
  ci_mpi_est, # Mazziotta-Pareto Index
  ci_wroclaw_est) # Wroclaw taxonomy method


kable(this.indic.matrix.norm2, caption = "Composite with different algorithm") %>%
           kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Composite with different algorithm
Benef_Doubt Benef_Doubt_Rob Benef_Doubt_Dir Benef_Doubt_Cons Factor Mean_Geom Mean_Min Mazziotta_Pareto Wroclaw
Bahama Palm Shores 1.0000000 1.282425 1.00 0 0.3446640 2.136049 99.36180 105.24732 0.2123878
Bahamas Coral Island 0.7500000 1.000000 0.75 0 -0.5217482 1.000000 89.27198 92.18082 0.8722567
Blackwood 1.0000000 1.234568 1.00 0 0.6741554 2.233855 111.93174 160.84403 0.5982153
Casuarina Point 0.9285714 1.043932 0.88 0 0.2139714 1.782163 107.05468 149.44566 0.4588102
Cedar Harbour 1.0000000 1.005025 1.00 0 0.0323310 1.849080 103.16969 127.58559 0.6106535
Central Pines 0.7500000 1.000000 0.75 0 -0.5172230 1.065041 89.87882 92.90228 0.8424421
Cherokee Sound 1.0000000 1.028278 1.00 0 0.2312486 2.021530 97.69519 113.18740 0.5538397
Cooper’s Town 1.0000000 1.006289 1.00 0 -0.1221133 1.655507 95.85798 110.00777 0.7086671
Crossing Rocks 1.0000000 1.003764 1.00 0 0.3110427 1.717668 108.36854 161.97803 0.6738644
Crown Heaven 1.0000000 1.277955 1.00 0 0.1761553 2.210063 120.81317 224.33572 0.5173436
Dundas Town 1.0000000 1.006711 1.00 0 -0.5411769 1.253451 90.98887 95.97269 0.7578805
Fire Road 0.8000000 1.000000 0.75 0 -0.2561360 1.286665 93.22568 94.99688 0.8189106
Fox Town 1.0000000 1.007557 1.00 0 0.2488732 1.763183 99.20790 123.10691 0.6106535
Great Cistern 1.0000000 1.273885 1.00 0 0.3307981 1.586261 104.37256 149.89893 0.6671368
Leisure Lee 1.0000000 1.000000 1.00 0 0.0824339 1.588053 100.91571 125.54331 0.7191976
Little Harbour 1.0000000 1.242236 1.00 0 0.1131143 1.612772 102.99653 139.00087 0.8298542
Marsh Harbour 1.0000000 1.072386 1.00 0 -0.2550090 1.441870 94.45670 104.67610 0.5456343
Mount Hope 1.0000000 1.210898 1.00 0 0.3087314 2.097435 99.72173 122.84750 0.4588102
Murphy Town 1.0000000 1.010952 1.00 0 -0.4190248 1.253451 92.79538 101.19725 0.7578805
Sandy Point 1.0000000 1.262810 1.00 0 0.1677827 2.133087 129.67997 351.97143 0.1977223
Spring City 0.8000000 1.000000 0.75 0 -0.2561360 1.286665 95.79409 110.33842 0.8189106
Treasure Cay 1.0000000 1.067616 1.00 0 -0.1288893 1.736159 100.65568 122.31697 0.7086671
Wood Cay 1.0000000 1.111111 1.00 0 -0.2178457 1.717668 97.74205 101.76152 0.7128980

As we can see some of the potential aggregation algorithm are not providing results from some location. We will therefore exclude them from the rest of the analysis.

## Eliminate automatically method that could not score some elements
this.indic.matrix.norm22 <- this.indic.matrix.norm2[, colSums(this.indic.matrix.norm2 != 0, na.rm = TRUE) > 0]

## Check if sum is  zero
this.indic.matrix.norm22 <- this.indic.matrix.norm22[, colSums(this.indic.matrix.norm22 != 0, na.rm = TRUE)  == nrow(this.indic.matrix.norm22)]

## Remove indic if standard deviation is o
this.indic.matrix.norm22 <- this.indic.matrix.norm22[, sapply(this.indic.matrix.norm22, function(x) { sd(x) != 0} )]

## Remove “NaN” or “Not a Number
this.indic.matrix.norm22 %>%
  summarise_all(function(x) sum(x[!is.na(x)] == "NaN") == length(x[!is.na(x)])) %>% # check if number of NaN is equal to number of rows after removing NAs
  select_if(function(x) x == FALSE) %>%       # select columns that don't have only nulls
  names() -> vars_to_keep                     # keep column names

# select columns captured above
this.indic.matrix.norm22 <- this.indic.matrix.norm22[ , vars_to_keep]                 



rm( ci_bod_constr_est_mpi, # Robust Benefit of the Doubt approach with constraint
  CI_BoD_MPI_estimated,
  ci_bod_est, # Benefit of the Doubt approach
  CI_BoD_estimated,
  ci_bod_dir_est, # Directional Robust Benefit of the Doubt approach
  CI_Direct_BoD_estimated,
  ci_rbod_est, # Robust Benefit of the Doubt approach
  CI_RBoD_estimated,
  ci_factor_est, # Factor analysis  componnents,
  dimfactor,
  CI_Factor_estimated,
  ci_mean_min_est, # Mean-Min Function
  CI_mean_min_estimated,
  ci_mpi_est, # Mazziotta-Pareto Index
  CI_MPI_estimated,
  ci_mean_geom_est, # Geometric aggregation
  CI_Geom_estimated,
  ci_wroclaw_est,# Wroclaw taxonomy method
  CI_wroclaw_estimated) 

Differences between algorithm

The various Index can be normalised again on a 0 to 1 scale in order to be compared. A specific treatment if necessary for index based on Factor analysis

## Factor analysis can provide negative rank - If it works we need to get rid of them
for (j in 1:nrow(this.indic.matrix.norm22)) {
this.indic.matrix.norm22[j ,c("Factor")] <- ifelse( min(this.indic.matrix.norm22[ ,c("Factor")]) < 0 ,
                                                     this.indic.matrix.norm22[j ,c("Factor")] + abs(min(this.indic.matrix.norm22[ ,c("Factor")])) ,
                                                     this.indic.matrix.norm22[j ,c("Factor")]) 
}

kept.methodo <- as.data.frame(names(this.indic.matrix.norm22))
kept.methodo$polarity <- "POS"
polarity2 <- as.character(kept.methodo$polarity)


this.indic.matrix.norm3 <- normalise_ci(this.indic.matrix.norm22,
                                        c(1:ncol(this.indic.matrix.norm22)),
                                        polarity =  polarity2,
                                        method = 3)
this.indic.matrix.norm3 <- this.indic.matrix.norm3$ci_norm

kable(this.indic.matrix.norm3, caption = "Location Ranking with different algorithms") %>%
           kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Location Ranking with different algorithms
Benef_Doubt Benef_Doubt_Rob Benef_Doubt_Dir Factor Mean_Geom Mean_Min Mazziotta_Pareto Wroclaw
Bahama Palm Shores 14.5 23 14.5 22.0 21.0 12 8 2.0
Bahamas Coral Island 1.5 3 3.0 5.0 1.0 1 1 23.0
Blackwood 14.5 18 14.5 23.0 23.0 21 20 8.0
Casuarina Point 5.0 13 5.0 19.0 16.0 19 18 3.5
Cedar Harbour 14.5 7 14.5 14.0 17.0 17 16 9.5
Central Pines 1.5 3 1.0 6.0 2.0 2 2 22.0
Cherokee Sound 14.5 12 14.5 20.0 18.0 9 11 7.0
Cooper’s Town 14.5 8 14.5 10.0 11.0 8 9 13.5
Crossing Rocks 14.5 6 14.5 21.0 13.0 20 21 12.0
Crown Heaven 14.5 22 14.5 16.0 22.0 22 22 5.0
Dundas Town 14.5 9 14.5 2.5 3.5 3 4 17.5
Fire Road 3.5 3 3.0 8.0 5.5 5 3 19.5
Fox Town 14.5 10 14.5 15.0 15.0 11 14 9.5
Great Cistern 14.5 21 14.5 18.0 8.0 18 19 11.0
Leisure Lee 14.5 3 14.5 12.0 9.0 15 15 16.0
Little Harbour 14.5 19 14.5 13.0 10.0 16 17 21.0
Marsh Harbour 14.5 15 14.5 9.0 7.0 6 7 6.0
Mount Hope 14.5 17 14.5 17.0 19.0 13 13 3.5
Murphy Town 14.5 11 14.5 2.5 3.5 4 5 17.5
Sandy Point 14.5 20 14.5 11.0 20.0 23 23 1.0
Spring City 3.5 3 3.0 2.5 5.5 7 10 19.5
Treasure Cay 14.5 14 14.5 7.0 14.0 14 12 13.5
Wood Cay 14.5 16 14.5 2.5 12.0 10 6 15.0

Let’s write this back to the excel doc

datawrite <- cbind(this.indic.df,
  this.indic.matrix.norm,
  this.indic.matrix.norm22,
  this.indic.matrix.norm3 )
wb <- loadWorkbook("DTM R3 DB Great-Little Abaco MSLA V4.xlsx")
choicesSheet <- xlsx::createSheet(wb, sheetName = this.dimension)
xlsx::addDataFrame(datawrite, choicesSheet, col.names = TRUE, row.names = TRUE)
xlsx::saveWorkbook(wb, "DTM R3 DB Great-Little Abaco MSLA V5.xlsx")

We can now build a visualisation for the comparison between different valid methods.

## Remove NaN
this.indic.matrix.norm4 <-  this.indic.matrix.norm3[,colSums(this.indic.matrix.norm3 != 0, na.rm = TRUE) > 0]

## keep that frame for later on for the viz
assign(  paste("scores.", this.dimension, sep = ""), this.indic.matrix.norm3 )

## Add blank variable for nice chart display
this.indic.matrix.norm4$Location <- NA
  
this.indic.matrix.norm4.melt <- melt(as.matrix(this.indic.matrix.norm4))


#Make plot
line <- ggplot(this.indic.matrix.norm4.melt, aes(x = Var2,
                                                 y = value,
                                                 color = Var1,
                                                 group = Var1)) +
  geom_line(size = 2) +
  scale_colour_manual(values = c("#8dd3c7","#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C",
                                 "#FB9A99", "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6",
                                 "#6A3D9A", "#fb8072", "#B15928", "#fdb462","#ccebc5",
                                 "#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442",
                                 "#0072B2", "#D55E00", "#CC79A7")) + ## as many color as locations.. 23
  geom_text_repel(
    data = this.indic.matrix.norm4.melt[ this.indic.matrix.norm4.melt$Var2 == "Wroclaw", ],
    aes(label = Var1),
    arrow = arrow(length = unit(0.03, "npc"), type = "closed", ends = "first"),
    direction = "y",
    size = 4,
    nudge_x = 45 ) +
  labs(title = paste0("Rank for Composite Indicator on ",  this.dimension ),
       subtitle = "Based on various weighting approach") +
  bbc_style() +
  theme( plot.title = element_text(size = 13),
         plot.subtitle = element_text(size = 11),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 10),
         strip.text.x = element_text(size = 11),
         panel.grid.major.x = element_line(color = "#cbcbcb"),
         panel.grid.major.y = element_blank(),
         axis.text.x = element_text(angle = 45, hjust = 1),
         legend.position = "none")

print(ggpubr::ggarrange(left_align(line, c("subtitle", "title")), ncol = 1, nrow = 1))

Severity Index on a map

## For mapping we normalise on mena-max
var <- c("Benef_Doubt", 
           "Benef_Doubt_Rob", 
           "Benef_Doubt_Dir",
           "Benef_Doubt_Cons",
           "Factor",
           "Mean_Geom",       
           "Mean_Min",
           "Mazziotta_Pareto",
           "Wroclaw" )

label <- c( "Benefit of the Doubt approach",
               "Robust Benefit of the Doubt approach",
               "Directional Robust Benefit of the Doubt approach",
               "Robust Benefit of the Doubt approach with constraint",
               "Factor analysis  componnents",
               "Geometric aggregation",
               "Mean-Min Function",
               "Mazziotta-Pareto Index",
               "Wroclaw taxonomy method")
polarity <- c( "POS",
                "POS",
                "POS",
                "POS",
                "POS",
                "POS",
                "POS",
                "POS",
                "NEG")

All.method <- as.data.frame( cbind(var, label,polarity))

## Filter method that are kept
All.method2 <- All.method[All.method$var %in% names(this.indic.matrix.norm22), ]

## Filter method that are kept
All.method2 <- All.method[All.method$var %in% names(this.indic.matrix.norm22), ]
polarity2 <- as.character(All.method2[ , c("polarity")])

this.indic.matrix.norm3 <- normalise_ci(this.indic.matrix.norm22,
                                        c(1:ncol(this.indic.matrix.norm22)),
                                        polarity =  polarity2,
                                        method = 2)
this.indic.matrix.norm3 <- this.indic.matrix.norm3$ci_norm * 100

## Creating a spatial Point data frame with coordinates
abaco <- SpatialPointsDataFrame(data[ ,c("C_103_lon","C_102_lat")], ## Coord
                                cbind(as.data.frame(this.indic.matrix.norm3),data[ ,c("D_101_fams","D_102_inds","C_101_name")] ), ## Composite with different algo - together with population
                                proj4string = CRS("+init=epsg:4326") ) ## Define projection syst 

## Create equivalent polygo with voronoi
# Using the union of points instead of the original sf object works:
voronoi <- st_voronoi(st_union(st_as_sf(abaco)))
# Clid boundary instead of the original island boundaries


abacomask <- readOGR( paste0(getwd(),"/abaco.geojson"))
## OGR data source with driver: GeoJSON 
## Source: "/home/edouard/R-project/severity-index-bahamas/abaco.geojson", layer: "abaco"
## with 1 features
## It has 5 fields
abacopoly <- st_intersection(st_union(st_as_sf(abacomask)),
                             st_cast(voronoi))
abacopoly2 <- as(abacopoly, "Spatial")

rownames(abaco@data) <- NULL
abaco@data$ID <- rownames(abaco@data)
abacodata <- as.data.frame(abaco@data)
rownames(abacodata) <- paste0("ID",abaco$ID)
abacopoly3 <- SpatialPolygonsDataFrame(abacopoly2, abacodata)

Get the background map from OSM. This requires a bit of testing to have the correct background aligned with the

#plot(abaco)
# Create a boundary box


## https://www.openstreetmap.org/export#map=9/26.4226/-77.3259
longitudes <- c(-78.3, -76.3)
latitudes <- c(27.09, 25.75)
bounding_box <- SpatialPointsDataFrame( as.data.frame(cbind(longitudes, latitudes)),
                                        as.data.frame(cbind(longitudes, latitudes)) , 
                                        proj4string = CRS("+init=epsg:4326") )

# download osm tiles
abaco.osm <- getTiles(
   x = bounding_box,
   type = "osm",
   zoom = 11,
   crop = TRUE,
   verbose = FALSE
  )
## Data and map tiles sources:
## © OpenStreetMap contributors. Tiles style under CC BY-SA, www.openstreetmap.org/copyright.
#tilesLayer(x = abaco.osm)

Generating now maps for each approach through a loop.

for (h in 1:nrow(All.method2)) {
  #h <- 1
  this.var <- as.character(All.method2[ h, c("var")])
  this.label <- as.character(All.method2[ h, c("label")])
  
  
  # get Map Background
  tilesLayer(x = abaco.osm)
  #Plot symbols with choropleth coloration
  propSymbolsChoroLayer(
    spdf = abaco,
    var = "D_102_inds",
    inches = 0.15, ## size of the biggest symbol (radius for circles, width for squares, height for bars) in inches.
    border = "grey50",
    lwd = 1, #width of symbols borders.
    legend.var.pos = "topright",
    legend.var.title.txt = "Population Size\n(# of Individual)",
    var2 = this.var ,
    #classification method; one of "sd", "equal", "quantile", "fisher-jenks","q6", "geom", "arith", "em" or "msd" 
    method = "quantile",  
    nclass = 5,
    col = carto.pal(pal1 = "sand.pal", n1 = 6),
    #legend.var2.values.rnd = -2,
    legend.var2.pos = "left",
    legend.var2.title.txt = "Index Value\n (scale 0 to 100)",
    legend.var.style = "e",
    add = TRUE
  )
  # Layout
  layoutLayer(title = paste0("Protection Index - based on ",this.label ),
              author = "Protection Working Group, Abaco - The Bahamas, November 2019",
              sources = "Source: Key Informant Interview - HDX/DTM/IOM", 
              tabtitle = TRUE, 
              frame = FALSE ,
              bg = "#aad3df",
              scale = "auto")
  # North arrow
  north(pos = "topleft")
  
  ### Now getting second  smoothed map
  smoothLayer(
    spdf = abacopoly3,
    var = this.var ,
    ###  "pareto" (means power law) or "exponential"
    typefct = "pareto",
    # distance where the density of probability of the spatial interaction function equals 0.5.
    span = 5000,
    # impedance factor for the spatial interaction function.
    beta = 8, 
    nclass = 5,
    col = carto.pal(pal1 = 'brown.pal', n1 = 6),
    border = "grey",
    lwd = 0.1,
    mask = abacopoly,
    #legend.values.rnd = -3,
    legend.title.txt = "Severity\nIndex",
    legend.pos = "topright"
  )
  # annotation on the map
  text(x = 692582, y = 1611478, cex = 0.8, adj = 0, font = 3,  labels =
         "Distance function:\n- type = pareto\n- beta = 8\n- span = 5 km")
  # Layout
  layoutLayer(title = paste0("Protection Index Spread - based on ",this.label ),
              author = "Protection Working Group, Abaco - The Bahamas, November 2019",
              sources = "Source: Key Informant Interview - HDX/DTM/IOM",
              tabtitle = TRUE,
              frame = FALSE ,
              bg = "#aad3df",
              scale = "auto")
  # North arrow
  north(pos = "topleft")

  
}